Continuous Control With Deep Reinforcement Learning

Authors: Lillicrap, T.P.; Hunt, J.J;Pritzel, A.; Heess, N.; Erez, T; Tassa, Y; Silver, D.; Wierstra, D.;
Venue: ICLR
Year: 2016
Reviewed by: Joe Kershaw,

Overview
Specific Problem
Solutions
Results
Additional comments

Overview

This article focuses on utilizing reinforcement learning for control applications within physical space.

The machine learning method of Reinforcement Learning has been highly valued for its ability to observe a scenario and solve problems related to it. Though this method can handle highly dense observational inputs, it does not show the same aptitude in the physical domain. This comes from the fact that traditional reinforcement learning makes an optimal decision through an iterative thought process as opposed to just reacting. This is further complicated by the fact that acting in the real world requires more continuous inputs and actions than more simplified realms. Additionally the 3rd dimension rapidly creates complex making the problem space rise exponentially the more complex the robot is. This paper uses advances in reinforcement learning techniques and research to tune and optimize this method to overcome and act in the physical space.

Solutions

Convert from utilizing a greedy policy to an actor-critic one. Greedy policies act too slowly for high dimensionality and reactiveness.
Utilize a buffer and batch learning to apply some semblance of a time domain to actions. Traditionally observations are treated as independent.
The utilized Reinforcement Learning algorithm was shown to be unstable when used in a neural network. To remedy this, a copy of the model was used that observed the actions of the active model and would make weight changes on a damped scale to avoid instability.
In order to overcome various environments, batch normalization was used for a more concise decision policy and an exploratory factor was added by adding noise to the critic network.

Results

Algorithm was performed in virtual environments performing various activities including driving and walking scenarios. These simulation would provide both a visual environment(i.e. computer screen) and some would provide virtual sensor feedback. Batch normalization turned out to be a key feature in this technique. Additionally, for simple task the only feed required was the image of the environment as opposed to additional feedback. This could be promising for difficult sensing environments where only visuals are viable.

Additional comments

Though this paper was published in 2016, it already feels dated. The concepts being lauded as novel, such as batch normalization, were part of my introduction to machine learning in 2019. It is shocking how rapid the deep learning applications are evolving, but the work of live reinforcement learning is still being optimized today.